Skip to content

[2.8] Warn on deploy prepare storage rewrites#4530

Merged
YuanTingHsieh merged 11 commits intoNVIDIA:2.8from
YuanTingHsieh:codex/deploy-prepare-storage-warnings
May 7, 2026
Merged

[2.8] Warn on deploy prepare storage rewrites#4530
YuanTingHsieh merged 11 commits intoNVIDIA:2.8from
YuanTingHsieh:codex/deploy-prepare-storage-warnings

Conversation

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh commented May 6, 2026

What changed

  • Relocate Docker server job history and snapshot storage into the mounted workspace, matching the existing K8s server prepare behavior.
  • Share the server storage relocation helper across Docker and K8s prepare paths.
  • Warn when deploy prepare replaces/removes existing resource manager, resource consumer, or launcher configuration that would otherwise be silently discarded.
  • Warn when a present snapshot_persistor cannot be relocated because it does not expose snapshot_persistor.args.storage.args.root_dir; absent snapshot_persistor remains silent.

Why

Containerized deploy prepare rewrites runtime components and, for K8s, relocates server storage to durable workspace storage. Before this change, Docker server prepare left default job/snapshot stores under container-local /tmp/nvflare/..., and both component rewrites and malformed snapshot persistor layouts could silently drop operator intent.

Validation

  • python3 -m py_compile nvflare/tool/deploy/deploy_commands.py tests/unit_test/tool/deploy/deploy_commands_test.py

@YuanTingHsieh YuanTingHsieh changed the title [codex] Warn on deploy prepare storage rewrites [2.8] Warn on deploy prepare storage rewrites May 6, 2026
@YuanTingHsieh YuanTingHsieh marked this pull request as ready for review May 6, 2026 20:38
Copilot AI review requested due to automatic review settings May 6, 2026 20:38
@YuanTingHsieh YuanTingHsieh changed the base branch from main to 2.8 May 6, 2026 20:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves nvflare deploy prepare for containerized runtimes by relocating server-side job/snapshot storage into the mounted workspace (to align Docker with existing K8s behavior) and by surfacing warnings when prepare-time component rewrites would otherwise silently discard operator configuration.

Changes:

  • Relocate Docker server snapshot/job storage paths into the mounted workspace during deploy prepare (matching K8s behavior).
  • Share/extend the server storage relocation helper and add warnings for non-relocatable snapshot persistor shapes.
  • Warn when deploy prepare replaces/removes existing resource manager/consumer and launcher components/config during runtime preparation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
nvflare/tool/deploy/deploy_commands.py Adds Docker server storage relocation, introduces warning helpers for discarded component config, and warns on unexpected snapshot persistor shapes.
tests/unit_test/tool/deploy/deploy_commands_test.py Adds unit tests for Docker server storage relocation and for warning/silence behavior around snapshot persistor shape and component replacement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread nvflare/tool/deploy/deploy_commands.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR relocates Docker server job-history and snapshot storage into the mounted workspace (matching the existing K8s behaviour), and adds per-component warnings when deploy prepare silently discards custom resource_manager, resource_consumer, or launcher configuration. It also improves _relocate_server_storage_to_workspace to warn instead of crash when a present snapshot_persistor lacks the expected nested args.storage.args.root_dir key.

  • _prepare_docker now calls _relocate_server_storage_to_workspace for the server role, so Docker and K8s server kits share the same storage-relocation logic.
  • New _warn_for_replaced_components helper inspects components before _patch_resources removes them, emitting warnings for any with a non-builtin path or non-empty args.
  • _relocate_server_storage_to_workspace distinguishes an absent snapshot_persistor (silent) from a present-but-malformed one (warns), and also now catches TypeError to handle null or unexpected types gracefully.

Confidence Score: 5/5

Safe to merge; all code paths are well-guarded and covered by new tests.

The change is narrowly scoped: it adds a read-then-warn pass before component replacement and wires up Docker server storage relocation that was already working for K8s. The warn-before-mutate ordering is correct, the builtins sets are complete for the current launcher catalogue, and the new tests verify both the happy path (relocation succeeds, no spurious warnings) and the error paths (malformed persistor, custom config discarded). The only edge case noted — a JSON-null snapshot_persistor emitting a misleading warning — has no impact on the core relocation or patching logic.

No files require special attention.

Important Files Changed

Filename Overview
nvflare/tool/deploy/deploy_commands.py Adds Docker server storage relocation, a shared _warn_for_replaced_components helper, and improves _relocate_server_storage_to_workspace with a presence-check guard and TypeError handling. Logic is correct; one minor edge case noted for snapshot_persistor: null.
tests/unit_test/tool/deploy/deploy_commands_test.py New tests cover Docker server storage relocation, absence-is-silent semantics, malformed snapshot_persistor warning, custom-config replacement warnings, and the no-warn-for-builtins path. All assertions are accurate and align with the implementation.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / prepare_deployment
    participant PD as _prepare_docker / _prepare_k8s
    participant PR as _patch_resources
    participant WRC as _warn_for_replaced_components
    participant RSS as _relocate_server_storage_to_workspace
    participant FS as resources.json.default

    CLI->>PD: prepare (kit_info, config)
    PD->>PR: (kit_dir, launcher_id, launcher_path, launcher_args)
    PR->>FS: read resources
    PR->>WRC: components, launcher_id, launcher_path
    WRC-->>PR: emit Warning: (if custom resource_manager / resource_consumer / launcher found)
    PR->>FS: write patched resources (passthrough RM + new launcher)

    alt "role == ROLE_SERVER"
        PD->>RSS: (kit_dir, workspace_mount_path)
        RSS->>FS: read patched resources
        alt snapshot_persistor present
            RSS->>RSS: update snapshot_persistor.args.storage.args.root_dir
            note right of RSS: emit Warning if KeyError / TypeError
        end
        RSS->>RSS: update job_manager.args.uri_root
        RSS->>FS: write relocated resources
    end

    PD-->>CLI: result dict
Loading

Reviews (7): Last reviewed commit: "Merge branch '2.8' into codex/deploy-pre..." | Re-trigger Greptile

Comment thread nvflare/tool/deploy/deploy_commands.py Outdated
Comment thread nvflare/tool/deploy/deploy_commands.py Outdated
@YuanTingHsieh YuanTingHsieh merged commit e49655e into NVIDIA:2.8 May 7, 2026
24 checks passed
@YuanTingHsieh YuanTingHsieh deleted the codex/deploy-prepare-storage-warnings branch May 7, 2026 20:51
YuanTingHsieh added a commit that referenced this pull request May 8, 2026
same as #4530

## What changed

- Port the deploy prepare storage relocation and observability warnings
from the 2.8 PR to main.
- Relocate Docker server job history and snapshot storage into the
mounted workspace, matching K8s server prepare behavior.
- Warn when deploy prepare replaces or removes existing resource
manager, resource consumer, or launcher configuration.
- Warn when a present snapshot_persistor cannot be relocated because it
does not expose snapshot_persistor.args.storage.args.root_dir.

## Why

Containerized deploy prepare rewrites runtime components and relocates
server storage. Without these warnings and Docker storage relocation,
operator customization can be silently dropped and server state can
remain on container-local storage.

---------

Co-authored-by: Peter Cnudde <pcnudde@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants